Automated Extraction of Tags from the Penn Treebank

نویسندگان

John Chen

K. Vijay-Shanker

چکیده

The accuracy of statistical parsing models can be improved with the use of lexical information. Statistical parsing using Lexicalized tree adjoining grammar (LTAG), a kind of lexicalized grammar, has remained relatively unexplored. We believe that is largely in part due to the absence of large corpora accurately bracketed in terms of a perspicuous yet broad coverage LTAG. Our work attempts to alleviate this diiculty. We extract diierent LTAGs from the Penn Treebank. We show that certain strategies yield an improved extracted LTAG in terms of compactness, broad coverage, and supertagging accuracy. Furthermore, we perform a preliminary investigation in smoothing these grammars by means of an external linguistic resource, namely, the tree families of an XTAG grammar, a hand built grammar of English.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Nondeterministic LTAG Derivation Tree Extraction

In this paper we introduce a naive algorithm for nondeterminisctic LTAG derivation tree extraction from the Penn Treebank and the Proposition Bank. This algorithm is used in the EM models of LTAG Treebank Induction reported in (Shen and Joshi, 2004). Given the trees in the Penn Treebank with PropBank tags, this algorithm generates shared structures that allow efficient dynamic programming in th...

متن کامل

Morphological Features for Parsing Morphologically-rich Languages: A Case of Arabic

We investigate how morphological features in the form of part-of-speech tags impact parsing performance, using Arabic as our test case. The large, fine-grained tagset of the Penn Arabic Treebank (498 tags) is difficult to handle by parsers, ultimately due to data sparsity. However, ad-hoc conflations of treebank tags runs the risk of discarding potentially useful parsing information. The main c...

متن کامل

Identifying Verb Arguments and their Syntactic Function in the Penn Treebank

In this paper, we present a tool that allows one to automatically extract verb argument-structure from the Penn Treebank as well as from other corpora annotated with the Penn Treebank release 2 conventions. More specifically, we examine each possible sequence of tags, both functional and categorial and determine whether such a sequence indicates an obligatory argument, an optional argument or a...

متن کامل

Parsing Arabic Using Treebank-based Lfg Resources

In this paper we present initial results on parsing Arabic using treebank-based parsers and automatic LFG f-structure annotation methodologies. The Arabic Annotation Algorithm (A) (Tounsi et al., 2009) exploits the rich functional annotations in the Penn Arabic Treebank (ATB) (Bies and Maamouri, 2003; Maamouri and Bies, 2004) to assign LFG f-structure equations to trees. For parsing, we modify ...

متن کامل

Sense Tagging the Penn Treebank

This paper describes the methodology that is being used to augment the Penn Treebank annotation with sense tags and other types of semantic information. Inspired by the results of SENSEVAL, and the high inter-annotator agreement that was achieved there, similar methods were used for a pilot study of 5000 words of running text from the Penn Treebank. Using the same techniques of allowing the ann...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2000

Automated Extraction of Tags from the Penn Treebank

نویسندگان

چکیده

منابع مشابه

Nondeterministic LTAG Derivation Tree Extraction

Morphological Features for Parsing Morphologically-rich Languages: A Case of Arabic

Identifying Verb Arguments and their Syntactic Function in the Penn Treebank

Parsing Arabic Using Treebank-based Lfg Resources

Sense Tagging the Penn Treebank

عنوان ژورنال:

اشتراک گذاری